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ABSTRACT 

The large-scale structure of the Universe is thought to evolve by a process of gravita¬ 
tional amplification from low-amplitude Gaussian noise generated in the early Universe. 

The later, non-linear stages of gravitation-induced clustering produce phase correlations 
with well-defined statistical properties. In particular, the distribution of phase differ¬ 
ences D between neighboring Fourier modes provides useful insights into the clustering 
phenomenon. Here we develop an approximate theory for the probability distribution 
D and test it using a large battery of numerical simulations. We find a remarkable 
universal form for the distribution which is well described by theoretical arguments. 

Subject headings: cosmology: theory — methods: statistical 


1. INTRODUCTION 

The large-scale structure of the Universe is a complex interconnecting pattern whose structural 
elements comprise filaments, sheets and clusters of galaxies surrounding large voids. According to 
standard theories this “cosmic web” develops by a process of gravitational instability from small 
initial fluctuations in the density of a largely homogeneous early Universe. 

The physical description of an inhomogeneous Universe revolves around the dimensionless 
density contrast, <5(x), which is obtained from the spatially-varying matter density p(x) via <5(x) = 
[p(x) — po]/po where po is the global mean density. It is useful to expand the density contrast in 
Fourier series, in which <5 is treated as a superposition of plane waves: 

<5(x) = X <5(k) exp(?'k • x). 


(1) 
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The Fourier transform <5(k) is complex and therefore possesses both amplitude C(k) = |<5(k) | and 
phase where 

<5(k) = C(k) exp(i<f? fc ) = A(k) + iB( k); (2) 

the real and imaginary parts are A and B so that A(k) = C(k) cos(<3V) and B(k) = C*(k) sin(d > fc). 
We also have C 2 ( k) = A 2 (k) + B 2 { k) and tand>^ = B(k)/A( k). 

In theories of structure formation involving cosmic inflation (Guth 1981; Guth & Pi 1982; Linde 
1982; Albrecht & Steinhardt 1982), the initial fluctuations that seeded the structure formation 
process form a Gaussian random field (Bardeen et al. 1986) possessing the properties of statistical 
homogeneity and isotropy. In such fields the real and imaginary parts of <5(k) are independent 
Gaussians so that the modulus C(k) has a Rayleigh distribution and the phases <f>fc are uniformly 
random on the interval [0, 27 t] . Obviously the distribution of <5 in position space will also be 
Gaussian; indeed all finite-dimensional joint probabilities of 5 in different locations are multivariate 
Gaussian in this case (Bardeen et al. 1986). 

Even if the primordial density fluctuations were indeed Gaussian, the later stages of gravita¬ 
tional clustering must induce some form of non-linearity. One particular way of looking at this 
issue is to study the behavior of Fourier modes of the cosmological density field. If the hypothesis 
of primordial Gaussianity is correct then these modes began with random spatial phases. In the 
early stages of evolution, the plane-wave components of the density evolve independently like linear 
waves on the surface of deep water. As the structures grow in mass, they interact with other in 
non-linear ways, more like waves breaking in shallow water. These mode-mode interactions lead to 
the generation of coupled phases. While the Fourier phases of a Gaussian field contain no informa¬ 
tion (they are random), non-linearity generates non-random phases that contain much information 
about the spatial pattern of the fluctuations but which is ignored entirely in the usual clustering 
descriptors, such as the power spectrum (see below). There have been a number of attempts to gain 
quantitative insight into the behavior of phases in gravitational systems. Some studies (Ryden &; 
Gramann 1991; Soda & Suto 1992; Jain & Bertschinger 1998) have concentrated on the evolution 
of phase shifts for individual modes using perturbation theory and numerical simulations. An alter¬ 
native approach was adopted by Scherrer, Melott & Shandarin (1991), who developed a practical 
method for measuring the phase coupling in random fields that could be applied to real data. More 
recent studies have established connections between phase evolution, clustering dynamics and mor¬ 
phology (Chiang & Coles 2000; Chiang 2001; Chiang, Coles, & Naselsky 2002) and new methods 
have been developed for visualizing phase information (Coles & Chiang 2000). Connections have 
also been demonstrated (Watts & Coles 2003) between phase information and alternative measures 
of non-linear clustering such as the bispectrum (Scoccimarro, Couchman & Frieman 1999; Verde 
et al. 2000, 2001). 

One of the major barriers to the more widespread use of phase-based methods for probing 
cosmic nonlinearity is the lack of any well-established statistical framework for quantifying the 
information contained in the Fourier phases. Since phases are angular variables traditional statis¬ 
tical measures of location and dispersion are inappropriate. Moreover, probability distributions for 
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circular variables have very different properties to those for variables defined on the real line; see 
the monographs by Mardia (1972) and Fisher (1993) for more detailed discussion. The upshot of 
this is that it is not yet known whether there are useful standard distributions and limit theorems 
like those that make the Gaussian distribution so useful and so ubiquitous for statistical analysis. 
It is this deficiency we address in this letter. 


2. STATISTICAL DESCRIPTION 

Non-linear clustering is difficult to handle rigorously with analytic methods. Our approach is 
therefore to develop an approximate theory and test it using numerical experiments. The simula¬ 
tions we shall be comparing to theory are constructed within a finite cubic volume with periodic 
boundary conditions. The Fourier representation of the clustering pattern they reveal is therefore 
discrete. In particular phases are defined for wave-vectors on a cubic lattice which, for simplicity, 
we take to have an integer spacing. In previous analyses of phase coupling phenomena (Chiang & 
Coles 2000; Chiang 2001; Chiang, Coles, & Naselsky 2002; Coles Sz Chiang 2000) it has been found 
useful to introduce a quantity Dk, defined by 

Dk = d'fc+1 - (3) 

which measures the difference in phase of modes with neighboring wave-numbers in one dimension. 
This has many advantages as a descriptor, not the least of which is that a translation of the origin 
by x which alters each &k by kx only produces a constant offset in Dk- One can analyze a three- 
dimensional simulation by extracting a vector (D±, D 2 , D%) from differences in three orthogonal 
fc-space directions; the result contains information about both the location and structure of features 
in ( x,y,z ) space. One can think of Dk as a discrete representation of d&k/dk, the phase gradient. 
Note that if the two angles d>i and d >2 are independent and uniformly random then the difference 
$1 — <3? 2 is also uniformly random, so that Dk will be uniformly distributed for Gaussian fields. 

When fluctuations are small (5 « 1) they evolve linearly: the initial statistics are preserved in 
this regime because each mode evolves independently. When 5 ~ 1, however, mode-coupling terms 
alter the distributions of both amplitudes and phases. Strictly speaking, therefore, the real and 
imaginary parts of the Fourier representation of 5 in this regime are no longer Gaussian. However 
the form of the Fourier expansion (1) itself guarantees that, as long as the autocorrelations of 5 do 
not have too large a spatial extent, the Fourier superposition will be approximately Gaussian. It 
is therefore still a reasonable approximation to take A and B to be Gaussian even for non-linear 
systems (Fan & Bardeen 1995). More important for our purposes is the fact that, while for Gaussian 
fields each /c-rnode is statistically independent, a non-linear field contains mode-mode correlations 
(Soccimarro, Zaldarriaga & Hui 1999). These induce phase correlations which manifest themselves 
in departures from uniformity of the distribution of Dk- 

We can model the effect by assuming that the two neighboring modes involved in (3) have real 
and imaginary parts that are Gaussian distributed. For notational ease we suppress the label k and 
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so the real parts of each are written A\ and A 2 and the imaginary parts B i, i? 2 - We assume these 
can be approximated as Gaussians but we allow them to be cross-correlated. The four variables 
{A\, Bi, A 2 , B 2 ) therefore possess a four-dinrensional multivariate Gaussian distribution. 

If a set of random variables X\ ... have a multivariate Gaussian distribution the joint 
probability density function of the variables has the form 

V N (X U ...,X N ) = lexp(-i^W M-” 1 X 3 ), (4) 

i,3 


where K = (27r) Ar / 2 (det M) 1//2 and the correlation matrix M l3 = ( X^Xj ). For the case we are 
interested in we can write (A 2 ) = (Bf) = a 2 for i = 1,2. The quantity a 2 {k) is related to 
the power-spectrum, P(k ) defined by P(k) = (|<5(k)| 2 ). We can also define a quantity a by 
(. AiAj) = (BiBj) = (Tiaja where a(k) parameterizes the cross-correlation of the modes. We 
also have ( AiBi) = 0 (no summation), guaranteeing translation invariance (Chiang & Coles 2000), 
while ( A{Bj) = — {BiAj) = in a similar fashion to a. Note that a and j3 may well be equal 

but we have kept the most general possible form here. These two parameters allow us to construct 
the required distribution for V(A\, A 2 , B\, -B 2 ). We now convert the distribution V of the vector 
(j4i, A 2 , B\, B 2 ) to the distribution T of the vector (Ci, $ 1 , C 2 , $ 2 )- The result is 


P = 


C1C2 

4vrcr 2 cj 2 (1 - x 2 ) 


exp(—<2/2), 


( 5 ) 


where 

^ _ crlC 1 + a\C\ - ‘lo\ 02 C\C 2 X cos(4>i - 4> 2 + 0) u ., 

V — 2 2 /i 9 \ ' v^/ 

(1 - X 2 ) 

In these expressions we have used x 2 = a 2 + (3 2 and tan 0 = a//3. Note that if x = 0 (no mode 
correlations) this reduces to the product of two Rayleigh distributions for C\ and C 2 and two 
uniform distributions of dq and $ 2 . 


The next step is to construct a conditional distribution of one of the phase angles, say $ 2 , 
given specific values for the other three variables, which we designate as ci,C 2 and (f>i. This can be 
done straightforwardly using Bayes’ theorem. The result is 

Q (4?21ci , ( f>i , c 2 ) = - ~7 t exp [— kcos(</>i - $2 + 0 )], (7) 

where Jo is a modified Bessel function of order zero and in which k = ciC 2 x/ (T i £T 2 (l — X 2 )- This 
conditional distribution shows how the correlation between neighboring phases arises since it de¬ 
pends upon (f>\. It is deficient, however, in that this distribution also depends on specified values 
for the amplitudes C\ and (q. 

A better theory might be obtained by marginalizing over these variables, but the integrals 
involved are messy. On the other hand the distribution of amplitudes is relatively narrow, peaking 
around c\ = <J\ and C 2 = 02 and x is presumably small for quasi-nonlinear stages. Under these 
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circumstances we expect the distribution of phase differences formed over a large number of pairs 
to follow the form given above, i.e. 

p ( D ) = 9 77 x exp [-« cos (D - //)]. ( 8 ) 

27t/o(k) 

where fi is the mean angle. This distribution is well-known in the field of circular statistics where 
it is known as the von Mises distribution (Mardia 1972; Fisher 1993). The mean, //, is controlled 
by the positions of individual features of the distribution and will consequently vary from sample 
to sample. The parameter n, related to \i describes the level of nonlinearity. When k —* 0 
is small the distribution is approximately uniform, while for small k it takes the form P{D) ~ 
2^[1 + k cos (D — fi)\ showing that initial departures from random phases manifest themselves as a 
sinusoidal perturbation of P(D). In the limit k —► oo the distribution tends towards a single spike 
at 9 = n; this corresponds to a single concentration in position space. 


3. NUMERICAL SIMULATIONS 

To test our hypothesis with fairly complete coverage of possible parameters, we used a large 
ensemble of gravitational clustering simulations (Melott & Shandarin 1993). These comprise sets 
of 128 3 particles and are not large by current standards, but the particular benefit they offer our 
analysis is a fairly complete coverage of parameter space. There are four realizations of each type of 
initial conditions, using different pseudo-random number generators to generate the initial phases. 
Initial power spectra were pure power laws, i.e. P(k) = Ak n with indices n = — 2 , — 1 , 0, and 1. 
Data is taken every time the scale of clustering doubles, from k n \ = 64 kf to k n \ = Akf, where kf is 
the fundamental mode of the box and k n \ is defined by 

fk nl 

/ P{k)d 3 k = 1. (9) 

Jo 

The first stage may be significantly compromised by resolution effects; we have simulations with 
k nl = 2kf but these almost certainly suffer from problems connected with the boundary conditions 
(which are, as usual, periodic). 

Analysis of the properties of the phase differences was conducted on each realization of each 
spectrum and evolutionary phase, i.e. a total of 120 times altogether. We Fourier-transform 
each stage and extract phases for each wave-vector k. From these we obtain differences Dj. in 
three orthogonal k- space directions from which we form histograms. The mean value fi of each 
distribution contains information about the specific spatial location of dominant features (Chiang 
& Coles 2000), which will differ from realization to realization and which also varies with direction. 
We therefore rotate individual distributions so that they have the same mean value and “stack” the 
resulting histograms. We also combine all three directional differences into an overall histogram 
P(D ) where D = (Df + D\ + D^) 1 / 2 /y/3. This approach, together with the large size of the 
ensemble, produces final histograms with relatively small error bars, as can be seen in Fig 1. Some 
readers may find it difficult to see the data, as it overlays the theoretical model with high precision. 
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4. DISCUSSION 

The results show extremely good agreement over the range of initial power-spectra and evo¬ 
lutionary stages, although the model does break down at late stages for the case n = —2. This is 
not surprising, given the fact that such spectra have large amounts of power on large scales and 
phase correlations therefore develop extremely quickly. Even the earliest stage shown of this simu¬ 
lation shows a significantly non-uniform distribution of D. The development of phase correlations 
with evolutionary stage for each initial spectrum is represented by the increasing deviation from 
uniformity down each column starting, as expected, with sinusoidal departures. 

It is important to check whether deviations in the expansion rate or non power-law initial power 
spectra would alter these conclusions. Another ensemble of simulations used the power spectrum 
with CDM shape parameter (Bardeen et al. 1986) T = ilh = 0.225. evolved in both an open 
(Q m0 = 0.34, OCDM), a flat matter-dominated (f! m o = 1) TCDM), and a flat cosmological-constant 
background (Q. m o = 0.34, = 0.66, LCDM) All three ensembles were run to an amplitude 

corresponding to as = 0.93. A Hubble Constant h = 2/3 was used in the simulation analysis. These 
N-body runs had three realizations each of 256 3 particles in boxes of side 128 Mpc. The results 
are shown in figure 2; we found they are also nicely fit with the von Mises distribution. Results 
are shown also at z = 1. which further varies the normalization and background parameters, again 
with no significant deviation. 

The excellent match of the distribution (8) to the results of detailed numerical simulations may 
appear surprising given the very approximate nature of its derivation. But its validity is reinforced 
by the fact that it is the maximum entropy distribution on a circle for a fixed mean [i and fixed 
circular dispersion (Mardia 1972). It should really therefore be regarded as the circular equivalent 
of a Gaussian distribution, which has maximal entropy for fixed mean and variance on the real line. 

The universal behavior we have demonstrated will allow us in future to discriminate between 
gravitationally-induced mode-coupling and other forms, such as that induced by peculiar motions 
(Melott et al. 1998). It should be pointed out, of course, that in a realistic application to a sur¬ 
vey of galaxies, further coupling between the Fourier modes would arise due to the geometry and 
selection function of the survey. However, this need not be problematic. If one knew the window 
function sufficiently accurately its Fourier transform could be computed and a correction applied 
to the complex Fourier components themselves. 


This work was supported by PPARC grant PPA/G/S/1999/00660; ALM gratefully acknowl¬ 
edges the support of NSF through grant AST-0070702, and computing support from the National 
Center for Supercomputing Applications. 
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Fig. 2.— Histograms P(D ) of phase differences D are shown in grey along with the analytical 
model (8) for three different CDM models as functions of epoch. 




